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Abstract 

Let K be an isotropic convex body in M". Given e > 0, how 
many independent points Xi uniformly distributed on K are needed 
for the empirical covariance matrix to approximate the identity up 
to £ with overwhelming probability? Our paper answers this ques- 
tion from [12]. More precisely, let X G M" be a centered random 
vector with a log-concave distribution and with the identity as covari- 
ance matrix. An example of such a vector A is a random point in 
an isotropic convex body. We show that for any e > 0, there exists 
C{e) > 0, such that if A ^ C{e)n and (Aj)j<iv are i.i.d. copies of 



A, then 
1 — exp(— c-y/n) 



FEL^i - Id 



< e, with probability larger than 



AMS Classification: primary 52A20, 46B09, 52A21 secondary 15A52, 
60E15 

Keywords: convex bodies, log-concave measures, isotropic measures, 
random matrices, norm of random matrices, uniform laws of large 
numbers, approximation of covariance matrices 



^Work on this paper began when this author held a postdoctoral position at the De- 
partment of Mathematical and Statistical Sciences, University of Alberta in Edmonton, 
Alberta. The position was partially sponsored by the Pacific Institute for the Mathemat- 
ical Sciences. 

^This author holds the Canada Research Chair in Geometric Analysis. 



1 



1 Introduction 



Let X G be a centered random vector with covariance matrix S and 
consider N independent random vectors {Xi)i<N distributed as X. By the law 
of large numbers, the empirical covariance matrix YliLi ® -^i converges 
toEX®X = SasA^^ oo. Our aim is to give quantitative estimate of 
the rate of this convergence, that is, to estimate the size N of the sample for 
which 

II 1 ^ 

||-^X,®Xi-S <£||S|| (1.1) 

i=l 

holds with high probability. 

This question was investigated in [12] motivated by a problem of com- 
plexity in computing volume in high dimension. In particular the authors 
proved that 



N 9 



1=1 



where C = maxj<Ar E|Xj|'^/(E|Xjp)^. Chebyshev's inequality yields then a 
first estimate: for any e > 0, 5 G (0, 1), 



N 

/III, 

P 



1 ^ 

-^X,®Xi-S <e||S||)>l-5 (1.2) 



t=i 

whenever X > ^n'^- 

When random vectors are standard Gaussian, the covariance matrix is 
the identity and it is known (see the survey [S]) that fll.ip holds with high 
probability whenever X > An/e^. This raises the question about the order 
of the best X. In particular can it be proportional to ra, under reasonable 
assumptions? More precisely, the question in ^12j was phrased in the following 
setting. 

Let K C be a convex body and let X G X be a random point 
uniformly distributed on K. Suppose that X is centered at and that the 
covariance matrix of X is the identity of M". In such a case we shall say that 
X (or K) is isotropic. Note that any convex body with non empty interior 
has an affine isotropic image. In this setting and under these assumptions, 
the question may be stated as follows: 

Question: ([12]) Let K he an isotropic convex body in W^. Given e > 0, 
how many independent points Xj uniformly distributed on K are needed for 
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the empirical covariance matrix to approximate the identity up to e with 
overwhelming probability? 

Our main aim in this paper is to answer this question. As it is well 
known to specialists, a good framework for this kind of geometric probabilistic 
questions is given by log-concave distribution (see below for the definition). 
This is a stable and well structured class of measures in M" that contains 
uniform measure on convex bodies. Thus our goal is to estimate 

1 ^ 

p(||^5^X,®X,-e|| <£||S||) (1.3) 
1=1 

where E is the covariance matrix of a centered random vector X G with 
a log-concave distribution and (Xj) are N independent random vectors dis- 
tributed as X. 

Since for a symmetric matrix M, one has ||M|| = supy^g„-i{My,y) , (11.11) 



is implied by 

I 1 ^ 

|-5^((X„y)2-E(X„y)2) <e{J:y,y) for all y G M". (1.4) 

i=l 

In the case when the covariance matrix is the identity, it is equivalent to 

1 ^ 

l-e<-J2^X,,yf<l + e for all y e S^-'. (1.5) 

i=l 

Because of the linear invariance, there is no loss of generality to consider 
just this case when the covariance matrix is the identity. 

In this framework, a breakthrough was achieved in [7] where it was proved 
that for any e,6 & (0, 1), there exists C{e,6) > such that if a body K is 
isotropic then N = C{e, S)n\og^ n i.i.d. uniformly distributed points on K 
satisfy (11.21) . This estimate was further improved to N = C{e, 5)n\og^ n in 
[23] and to N = C{e,6)nlogn in [9] and [22]; the former paper treated the 
case when K is invariant under every reflection with respect to coordinate 
subspaces and the latter proved the estimate in full generality 

One should note that in all these results, the probability in (11.21) does not 
go to 1 as n goes to infinity, as one expects in this type of high dimensional 
phenomena. This probability, 1 — 5, is given by a parameter S and C{e,6) 
depends on it. Thus letting 6 tend to zero may destroy the estimate on N. To 
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emphasize this important feature we will talk about overwhelming probability 
if the probability goes to 1 as n goes to infinity. 

The first result establishing (11. ip with overwhelming probability was given 
in \X8\. When a body K is invariant under every reflection with respect to 
coordinate subspaces, it is proved in [2] that for any e G (0, 1) there exist 
C{6) > such that fll.Sp holds whenever N > C{e) n and with probability 
going to 1 as n goes to infinity. Finally, the present paper shows, as a 
consequence of our main results (Theorems 14.11 and 14. 2p . that the same is 
true for an arbitrary body K (in the isotropic position). 

An important related direction concerns norms of random matrices with 
independent log-concave columns (or rows). More precisely, let X G M"' 
be a centered random vector with a log-concave distribution such that the 
covariance matrix is the identity. Consider independent random vectors 
{Xi)i<N distributed as X and define A = A^'^^ to be the n x N matrix 
with {Xi)i<]sf as columns. For n, arbitrary (and A^ not too large, actually, 
n = N being the central case) the question is to prove an estimate for the 
norm \\A\\ as an operator A : £^ — > £2, valid with overwhelming probability. 
This problem can be viewed as an "isomorphic form" of an upper estimate in 
(11. 5p (for n = N, say), and the papers discussed above provided some answers 
- with "parasitic" logarithmic factors - to this question as well. The present 
article gives optimal estimates for \\A\\ (in Theorem 13.61 and Corollaries 13.81 
and l4.12|) : for example, for the square matrix if n = A^, we have \\A\\ < C^pri^ 
with overwhelming probability. 

To observe a still one more point of view, for arbitrary n and A^, consider 
again A = A^^\ The set of n x n matrices may be equipped with the 
distribution of AA* to be a matrix probability space and because of the 
analogy with Random Matrix Theory, in particular with Wishart Ensemble, 
let us call it a Log-concave Ensemble. 

In the last decades, in Asymptotic Geometric Analysis, considerable work 
and progress have been achieved in understanding the properties of random 
vectors with log-concave distribution, and more recently, in understanding 
spectral properties of random matrices with independent rows (or columns) 
with log-concave distribution. It appears that in high dimension they behave 
somewhat similarly as if the coordinate would be independent. This leads by 
analogy with Random Matrix Theory to questions on the spectrum of AA* 
similar to those of the Wishart Ensemble. One important difference is that 
now the entries are dependent but strongly structured by the log-concavity 
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hypothesis. 

Denote by Ai = Ai(A(^)) < ■ • ■ < A„ = A„(A(^)) the eigenvalues of AA* 
(the squares of the singular values of A). It was proved in pT] that when 
n/N goes to /5 G (0, 1) as n,N — > oo, then the empirical measures of the 
eigenvalues have a limit. It is the so-called Marchenko-Pastur distribution, 
as for the Wishart Ensemble when all entries of the matrix A are i.i.d. It is 
also known ([3]) in the case when all the entries of A are i.i.d. (with a finite 
fourth moment) and lim„^+oo ^ = /5 G (0, 1) that limAi/iV = (1 — v^)^ 
and \imXn/N = (1 + \f^Y ■ could conjecture that such results are also 
valid in the log-concave setting. Nevertheless, these results are asymptotic 
and not quantitative (given fixed dimension). 

Problem fll.Sp is of course equivalent to quantitative estimates for Ai(A(^)) 
and \ Ji^J^^^\ that is of the support of the spectrum of A. An answer is given 
by Proposition 14.41 where it is shown that for n < N < exp{^/n), 

I — ^ I — 

1 - log < 1 Y.(^„yf <l+C^^ log '-^ tor all , e S""' 

holds with probability larger than 1 — exp(— cy^), where C,c > are nu- 
merical constants. Thus, putting /? = G (0, 1), we get 

1 - Cv^log (2//5) < ^ < ^ < 1 + Cy^log (2/(3) 

with overwhelming probability. As a consequence already mentioned earlier, 
II A II < C{\/N + ^/n) with overwhelming probability, where C > is a 
numerical constant (Corollary I4.12p . 

Our general method follows an approach that can be traced back to Bour- 
gain |7] (cf. also tlQj). It relies upon a crucial new ingredient of a novel 
chaining argument that in an essential way depends on the distribution of 
coordinates of a point on the unit sphere. What makes this approach work, 
by rather subtle estimates, is a special structure of the sets used for the 
chaining. 

To describe a very rough idea of this structure, involved in the proof of 
Theorem 13. 61 below, assume for simplicity that m = n = 2^ and let Ofc = 2^~'^ 
foT 1 < k < s. For each k, first consider the subset of the Euclidean unit 
ball in of all vectors that have the support of cardinality less than or 
equal to and with the £oo norm of the coordinates bounded by ak, and 
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then define A4^'^^ to be a preassigned net (in the Euclidean norm) of this 
set, where < a^, < 1 are judiciously fixed in advance. Using sets Ai^'^^ 
in successive steps of chaining we arrive to the set Ai that consists of sums 
V = Ylik'^k where v^'s are mutually disjointly supported vectors from AA'^^^ 
(assuming that the Euclidean norm of v is less than 2). As can be expected 
the actual definition of M. contains a number of delicate points which were 
omitted here and can be found at the beginning of the proof of Theorem 13.61 
However it is given in just one step without discussing each individual step 
of the chaining. 

The paper is organized as follows. In the next Section [2] we present 
some definitions and preliminary tools. In Section [3] we study the norm of a 
restriction of the matrix A = A^^^ defined by 

Am= sup ||A|irf|| = sup \Az\. 

FC{1,...,N} 

\F\<m |suppz|<m 

We show in Theorem 13.61 that with overwhelming probability, 

Am < C ( y/n + v^log 

\ m 

In Section 14.11 we prove the result announced in the abstract, answering a 
question from p^. This theorem appears as a particular case of a more 
general study of 

sup \ J2{{X,,yr-E{X,,yr) 



y&S' 



defined for any p > 1. Such processes have been studied in [TO], fTT] and [T7] . 

Section 14.21 describes several observations for norms of random matrices 
from £2 to ip, p ^ 2. In the final Section we sketch a more elementary 
proof of the main result of Section HTTl when p = 2. 



2 Notation and preliminaries 

We equip and M'^ with the natural scalar product ( ■, ■) and the natural 
Euclidean norm | ■ | . We also denote by the same notation | ■ | the cardinality 
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of a set. In this paper, X will denote a random vector in and (Xj) 
will be independent random vectors with the same distribution as X. By 
Id we shall denote the identity on W and by E = E(X) = EX X, the 
covariance matrix of X (here X ® X is the rank one operator defined by 
X(g)X{y) = {X,y)X, for all y G M"). By ||M|| we shall denote the operator 
norm of a matrix M, that is ||M|| = sup|j^|^]^ \My\. 

Definition 2.1. A random vector X G M" is called isotropic if 

E{X,y) = 0, E\{X,y)\^=\y\^ for all y G M", (2.1) 
in other words, if X is centered and its covariance matrix is the identity: 

EX(g)X = Id. 

Recall that a function / : — ^ M is called log-concave if for any 6 G [0, 1] 

and any Xi,X2 G M", 

f{ex, + ii-e)x2)>fix,yfix2y-'. 

Definition 2.2. A measure fi on R" is log-concave if for any measurable 
subsets A, B of R" and any 9 G [0, 1], 

/x(M + (1 - e)B) > ij{Affi{BY^-'^ 

whenever the set 

eA + {l- 9)B = {9x1 + (1 - 0)X2 : xi G A, X2 G B} 
is measurable. 

The Brunn-Minkowski inequality provides examples of log-concave mea- 
sures, that are the uniform Lebesgue measure on compact convex subsets of 
R*^ as well as their marginals (cf. e.g., P4j). More generally, Borell's the- 
orem [5] characterizes the log-concave measures that are not supported by 
any hyperplane as the absolutely continuous measures (with respect to the 
Lebesgue measure) with a log-concave density. Note that the distribution 
of an isotropic vector is not supported by any hyperplane. Moreover, it is 
known [B] that if a measure is log-concave then linear functionals exhibit a 
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sub-exponential decay. To be more precise, recall that for a random variable 
Y, the ipi norm of Y is 

\\Y\\^, = inf |c > 0; Eexp (^^^ < 2 

A straightforward computation shows that for every integer p > 1, 

(E|y|P)VP < cp\\Y\\^, (2.2) 

where c is an absolute constant. 

We can now state the sub-exponential decay of linear functionals in terms 
of ipi norm [6]: 

Lemma 2.3. Let X E R"- be a centered random vector with a log-concave 
distribution. Then for every y G S"-^^, 

II {X,y)\\^,<i,iE\{X,y)\Y' 

where ip > is universal constant. Moreover, if X has a symmetric distri- 
bution then ip = 2. 

The moreover part easily follows by a direct calculation (see [20j). 
Putting together (12.21) and Lemma [2.31 we get that for every y G 3""^^, 

(E\{X,y)n'/^^<CpmX,y)\Y' (2.3) 
where C is an absolute positive constant. 



3 Norm of a random matrix 

In this Section Xi, . . . , X^ are independent random vectors in M". Mostly we 
work with i.i.d. random vectors, distributed according to an isotropic, log- 
concave probability measure on M". Random n x N matrix whose columns 
are Xj's is denoted by A and its operator norm from £^ to £3 is denoted by 
||A||. We will also use the following related notation, for 1 < m < A^, 

Am= sup ||A|irf|| = sup \Az\. 

FC{1,...,JV} zgS^-l 
l-F|<m |supp2|<m 

Note that A^ is increasing in m. Given a set C {1, A^} by Pe we denote 
the orthogonal projection from onto coordinate subspace of vectors whose 
support is in E. Such a subspace is denoted by M.^. 
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Lemma 3.1. Let Xi, . . . , be i.i.d. random vectors, distributed accord- 
ing to an isotropic, log-concave probability measure on M". There exists an 
absolute positive constant Cq such that for any N < exp{^/n) and for every 
K > 1 one has 

max \Xi\ < CQK^/n 

i<N 

with probability at least 1 — exp{—K^/n). 

Proof By [22j we have for every i < N 

F{\Xi\ > Ct^] < exp(-tcv^), 

where C and c are absolute positive constants. The result follows by the 
union bound (and adjusting absolute constants). □ 

Lemma 3.2. Let xi, . . . ,xn G IR"- There exists a set E C {1, A^}, such 
that 

{xi, Xj) < 4 ^ ^ {xi, Xj) . 

Proof Clearly one has 

i^j EC{1,...,N} ieE j&E'= i&E j&E'= 

from which the lemma follows. □ 



Now, given a. E C {1,...,A^}, e,a E (0,1], by Af{E,e,a) we denote an 
e-net of 5^ fl aB^ fl M'^ in the Euclidean metric. Standard volume estimate 
shows that we may assume that the cardinality of N'{E, e, a) does not exceed 
(3/e)™, where m is the cardinality of E. 

We will need the following two lemmas. 



Lemma 3.3. Let Xi, 

ip > such that 



. . , Xn be independent random vectors in and let 
sup sup II {Xi,y) 11^, < ^j. 

i<N yeS"-'^ 



Letm <N,e,ae (0,1] and L > 2m log Then 



P I sup sup sup 

Fc{i,...,JV} EcF z&jViF,s,a) 

|F|<m 



ZjXi 



j(^F\E 



< e 



-L/2 
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Proof Denote the underlying probability space by Q. For F C {!,... ,A^} 
with |F| < m, E G F, and z e AfiF, e, a), define the subset fl{F, E, z) of VL 

by 



Q{F,E,z)^\Y 



jeF\E I 



Fix F, E and z as above and set y — '^j^p\E ^j-^j- Clearly, y is independent 

of vectors Xj's, i E E, and \y\ < A^- Note that \y\ > on Q{F,E,z) 
(otherwise {ziXi, y) = for all i E E and the sharp inequality defining 
Q{F,E,z) would be violated). Thus, using the fact that || 

^||oo — ^1 we 

obtain 



E 

ieE 



\ jeF\E 



<aA^5^|(X,,y/|y|)|, 



ieE 



on fl{F, E, z). Since A^ > on fl{F, E, z), this implies 



p {n{F, E,z))<¥lY\ y/\y\)\ > ■ 

\ieE J 

On the other hand, by Chebyshev's inequality and the assumption on the 
■i/^i-norms of linear functionals, the latter probability is less than 



e-^ Eexp ( 



\{Xi,y/\y\ 



ieE 



Therefore by the union bound. 



P I sup sup sup 2, 

' Fc{i,...,JV} ECF zeAf(F,e,a) 

\F\<m V ' ' / i^t. 



\F\<m 

m 



iziXi, E ZjX. 
\ jeF\E 



> ip aLAr, 



k=l 

m 



fc=l 



< 5^ ( ^' 1 2™ 



2"^ e"^ < — — 
\m J \s 



l2eN 

exp I m log L 



me 
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which imphes the result. 



□ 



We will also need another lemma of a similar type. We provide the proof 
for sake of completeness. 

Lemma 3.4. Let Xi, . . . ,Xn be independent random vectors in MJ^ and let 
ip > such that 

sup sup II {Xi,y) 11^^ < ip. 

Let 1 < k,m < N, e, a G (0,1], P > 0, and L > 0. Let B{m, (3) denote the 
set of vectors x G with \ suppx| < m and let B be a subset of B{m,P) 

of cardinality M. Then 



P sup sup sup 

\F\<h 



( ZjXj.'^XjXj 



< M 



6eN 
ke 



Proof The proof is analogous to the argument in Lemma 13.31 For F C 
{1, N} with |F| < k, X E B, and z G A/'(F, e, a) consider 



n{F,x,z) = \Y, 



Fix F, Xj Z clS above and set y = Yjj^F^i^i- Clearly, y is independent 
of the vectors Xj's, i E F, moreover, \y\ < jSAm, and, similarly as in before, 
\y\ > on Q{F, x, z). Thus, using the fact that ||2;||oo < «, we obtain 



E 



\ j^F 



<a(3Arr,J2\(^^^y/\y\ 



ieF 



on Q{F, X, z). Therefore, again as in Lemma [3.31 we have 

F{n{F,x,z)) < F(j2\^X,,y/\y\)\>ijL 

\ieF / 

< e-^ Eexp 1^1^ \(^^'y/\y\)\ ) < 2l^l < 2^= 
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By the union bound we get 

P j sup sup sup >, ( '^iXii y XjXj ) > ijja(3LAr, 

^ / T\T\ /o\fe / T\T\ k / \ k 

L 




which proves the result. □ 



Remark 3.5. Observe that if Xj's are i.i.d. random vectors, distributed 
according to an isotropic, log-concave probability measure on M", then, by 
Lemma 12.31 they satisfy the condition for the 'i/'i-norm of Lemmas 13.31 and 
1331 

Theorem 3.6. Let n > 1 and I < N < be integers. Let Xi, . . . ,Xn 

are i.i.d. random vectors, distributed according to an isotropic, log- concave 
probability measure on M". Let K > 1. Then there are absolute positive 
constants C and c such that 

P (^m < N : Am>CK (^y/E + ^/mlog ^ < exp {-cKyJH) . 

Remark 3.7. Let X G be a random vector with an isotropic expo- 
nential distribution, that is with the density defined for x = (xj) G by 
Yli exp(— -\/2|a;j|). It is clearly an isotropic vector with a log-concave dis- 
tribution. Consider now the matrix A^^'' build as before from a sample of X 
of size N. Since 

P(|X| > ty^) > I ^ exp(-v^|s|) ds = exp(-V2tv^) 
J\s\>ty/E v2 

we get that for any 1 < m < N, 

P(^m > tV^) > exp{-V2tV^). 

This shows that the probability estimate in Theorem 13.61 is optimal up to 
numerical constants. The analysis of this example shows that up to numerical 
constants the logarithmic term in the estimate of A^ in Theorem 13.61 is also 
optimal (for the details see [I]). 
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Letting m = N we get a clearly optimal estimate for the operator norm 
II A II, valid with overwhelming probability. 

Corollary 3.8. In the setting of Theorem \3.6\ we get, for every K > 1, 

\\A\\ <CK (^^/^+^/N'^ , (3.1) 

with probability at least 1 — e~^^^, where C,c> are absolute constants. 

Remark 3.9. The final remark of [7] states that by refining a bit the method 
of proof of Lemma 2 of that paper one may obtain that if Xi, . . . , Xn are n 
independent vectors in R" distributed according to a probability measure n 
on M" satisfying ||(x,?/)||^^ < 1/^/n for all y G S"'~^, then, with probability 
1 — 5, the matrix A admits the bound for the operator norm 



l^ll < C{S) (^j (^ma^ \Xi\^ dfi 



+ 1 



By Lemmas 12.31 and 13.11 and taking into account the normalization, this 
would imply a version of (13.11) with N = n and probability 1 — 6. 



Remark 3.10. Note that \^ + ^/mlog^ in the formula in Theorem 13.61 
can be substituted with 



2N 



max{n, m} 

Indeed, if m > n there is nothing to prove, otherwise 



^/n + y/m log = y/n 

m 



m log h v'^log < 2^yn + v'^log . 

m n n 



Finally, another immediate consequence. 

Corollary 3.11. There are absolute positive constants C and c such that for 
every n>l, 1 < N < e^, K > 1, and 's as in Theorem \3.6\ one has 



P 3 



EC{1,...,N} 



> CK (^s/^l + \E\ log j < exp {-cKy/n) . 
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Proof Given E set m = \E\. Consider vector z G S ^ defined by Zi = 
1 / ^/m if z G -E and Zi = otlierwise. We liave 



Proof of Theorem As N < e^^, it is easy to see, by applying tlie 

union bound and adjusting absolute constants, tliat it is sufficient to prove 
tliat for K sufficiently large and every fixed m < N, one has 



We shall define a set M. of vectors with a special structure and supports 
less than or equal to m which serves simultaneously two purposes: we will 
be able to estimate with large probability sup^g^^j \Ax\, and we will use M. 
to approximate an arbitrary vector from B2 of support less than or equal to 
m. Then a standard argument will lead to the required estimate for A^- 

First observe that if for a vector x G 5*^^^ there is a simultaneous control 
of the size of support and its £oo-norm (more precisely, |suppx| ~ s and 
||a^||cxD < s~^/^, for some s > 1) then \Ax\ can be estimated, with large 
probability, directly by using Lemmas 13.21 and 13.31 (it is also a part of the 
estimates below). It is therefore natural to expect vectors from A4 to be 
sums of (disjointly supported) vectors admitting such a simultaneous control 
as above. Formally, the definition of Ai splits into two cases. If 




Therefore Theorem 13.61 and Remark 13.71 imply the result. 



□ 




m log 



A8eN 



(3.2) 



m 



we set 



M= U Af{E, 1/4,1). 



EC{1,...N} 
|B| = m 



Otherwise, let / be the smallest integer such that 




48e2'iV 



(3.3) 



m 
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and fix positive integers Oq, ai, . . . , a; such that < m2~^^^ ioi 1 < k <l 
and Oq < m2~\ and X]i=o'^*-' ~ shall later set := [m2^'^+-'^] — 

[m2~^] ioi 1 <k <l and Cq := [m2~'].) 

Then set Ai = Aio ^ 25^, where Aio consists of all vectors of the form 
X = Ylk=o^>'^ where Xj's have disjoint supports and 

EC{1,...N} EC{1,...N} \ / 

\E\<aQ \E\<a^ 

Note that for every vector x G we have |suppx| < Ylo^k = ra and 
\x\ < 2. 

We shall consider the details of the case mlog(48eA^/m) > ^Jn (the other 
case, when (13 .2^ holds, can be treated similarly, actually, it is even simpler, 
since the construction of M. is simpler). Fix x G of the form x = X]i=o -^^ 
and let be the support of Xk (if there are more than one such represen- 
tations, we fix one of them). Denote the coordinates of x by x{i), i < N, 
then 



\Ax\ 



5^ x(z)X„ ^ a;(^)xA = 5^ x(z)2|X,|2 + ^ (a;(z)X,, x(j)X,) 

\i<N i<N I i<N i^j 

< 2max|Xip + L'^. < 2max{2max|Xip,D^}, (3.4) 

i i 

where 

Note that by Lemma [3.11 maxj \Xi\ < CoKy/n with probability larger than 
1 — e~^^, and we would like to get a similar estimate for D^- 

To this aim we split according to the structure of x. Namely we let 

I 



k=0 iJeFf. 
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and 



k=0 i&Ff, 



k=l i&Fk r&Gk \ ieFr 



3 / ' 



where Gk = {0, k + 1, k + 2, . . . ,1}. Note that 

= + d:. 

We first estimate D'^. By Lemma 13.21 we obtain that for every k there 
exists a subset of F^. such that 



fc=0 ieFf. 



< 4 sup sup sup y, 

Fc{i,...,N} ECF veAf{F,l/4,l) 

F|<m/2' 



\ ien^; 



fc=i 



sup sup 

:{i,...,iv} 

F|<2m/2'= 



sup 



E 



i'eA/'{F,2-*,A/2'=/m) ieE 



We now apply Lemma 13.31 to each summand in the sum above with L = 
IK^fn^ e = a = 1 for the first summand (note that such an L satisfies 



4m. J>-1^„ 12eAf4* 



the condition) and with L = -^Klog 
By the union bound we obtain 



e = 2-\ a 



for k > 1. 



P supD; > 8^irA^v^ + 2Vi^A„ VW--plog 

2m , 12eA^4* 



< exp {-K^/n) + ^ exp f log 

k=l ^ 

/ 2m 

< exp (— fCi/n) + ^ Gxp ( —K—j- log 



m 

2m, 12eA^4' 



m 
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where ip is the absolute constant from Lemma 12.31 

Therefore, the choice of / imphes the following bound, with some absolute 
positive constant C, 

P ( sup > A^K (sijy/E + C^^/mlog — ) 



m I 



< exp {-Ky/n) + I exp {-K^/n) < {2y/n + 1) exp {-Ky/n) . 

(We also used the estimate I < 2y^, valid when m < N < e^.) 

The estimate for D" essentially follows the same lines. In a sense it is 
simpler, since we don't need to apply Lemma 13. 2[ For every 1 < k < I we 
consider Ai^ = A^^ H 25^, where ^A'|^ consists of all vectors of the form 
X = Xq + Yl[=k+i where Xj's = 0, A; = 1, . . . , Z) have pairwise disjoint 
supports and 



XoG U Ar(E,l/4,l), X, e U U[e,2-\^- \ fors>fc + l. 



EC{l,...iV} 

|B|<°0 



BC{l,...iV} 

\E\<as 



Then Mk C 2B^ and 



\Mk\ < 12"° Y[ (3-2^)"^ < 12"" Y[ 



s=k+l 



N 



s=k+l 



3 ■ 2'eN 



< 



exp 



m , X ^ 2m , SeA^N 



=fc+i 



2m 



< 



exp 



m 



< exp log 



6e4'=iV 



m ^ — ' 2 

s=0 



s=l 



2m 3e4'N 



vs=fc+l 



2m 



/4m, 6e4^A^ 
< exp — — log 



2fe 



m 



We also observe that 



fc=i ieFfc \ reGfe ieF^ 



< 



fc=i 



sup 

FC{l,...,iV} 
|F|<2m/2'» 



sup 



sup 



«6-F 
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Now we apply Lemma [3.41 to each summand with 

12m 12e4^iV 
1'^ m 



e = €k = 2-^ a = ak = VW^, = 2, i3 = = TW^. 
Using the union bound we obtain 

P C:>48M„/fEV»2j'°« — 

\ k=l 

,'4m, 12e4'=A^ 2m, 3e4'=iV 12m, 12e4'=A^ 
< > exp — log + — log lo 



2^ exp ( - 

k=l ^ 



m 2*^ m 2^^ m 



^ '6m, 12e4^iV\ . / _6m , 12e4'iV 



< exp ( -ir ^ log 1 < / exp I -K^ \o 

^-^ \ 2'^ ml V 2' m 

fc=i ^ ^ ^ 

As in the case for D'^ it follows that 

P (sup Dl > 3C%jjA^K^\og — ) < 2v^exp {-K^/n) , 

where C is the same absolute constant as above. Since = D'^ + D", then 

P ( sup > KArr, (sipVn + ACipV^log —] ] < (4v^+l)e-^^. (3.5) 
\xeM V m y y 



Passing now to the approximation argument, pick an arbitrary z G S^^^ 
with |supp2;| < m. Define the following subsets of {1,...,A^} depending 
on z. Denote the coordinates of z hj Zi {i = 1, N) . Let ni, . . . , n^v be 
such that \zni\ > \zn2\ > ••• > |-2njvl, SO that Zm = for z > m (since 
I supp 2;| < m). If condition (13. 2p holds we denote the support of z by Eq and 
consider only this Eq. Otherwise we set 

Eq = {^j}l<i<m/2' 

and 

El = {ni}m/2<i<my E2 = {?T-i}m,/4<«<m/2; • • • ; El = {?T-j}m/2'<j<m/2'-l ; 
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where / is the smallest integer satisfying fl3.3p (as before). (For small values 
of n it can happen that Eq is empty, but it does not create any difficulty in 
the proof below.) Clearly, we have 



ao := l^ol < W2', Ofc := \Ek\ < + 1 < for every 1 < A; < / 



and Yl\=Q^i — Note that the numbers a^'s do not depend on z, although 
the sets EkS do. Finally, since z e S^~^, we also observe that for every 
A; > 1, 




\\Pe,z\\oo < \Zns \ < V — , 

V m 

where s = [m/2^]. 

Note that for every k > 1 the vector Pe^z can be approximated by a 

vector from Af ^Ek,2~^, and the vector PeqZ can be approximated 

by a vector from M^Eq, 1/4, 1). Thus there exists x E Ai, with a suitable 
representation x = Ylk=o^k, such that 

I I 
\z -x\^ <Y1 \^E,z -Xk\^< 2-^ + J2 2~'' < 0.4. 

k=0 k=l 

Moreover, x is chosen to have the same support as z, and thus w = z — x 
has the support | suppw| < m. 

Considering all z G S^~^ with | supp2;| < m it follows that 

Am = sup \Az\ < sup \Ax\ + VOA sup \Aw\ = sup \Ax\ + VOAAm, 

suppz|<m |suppuj|<m 

which implies 

Am < 3 sup \ Ax\. 
Recall that by (13.41) for every x E A4 we have 

\Ax\^ < 2max{2 max I Xi I ^ , Dx } , 

i 

SO passing to the supremum 

Al^<9 sup \Ax\'^ < 9max{4max|Xi|^2 sup D^}. (3.6) 
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Applying Lemma 13.11 and fl3.5l) we get 

2N 

A^<K{6Co + lUij) + 72 C^jK ^\og — 

m 

with probability larger than 

1 - {Ay/n + 2) exp {-K^) > 1 - exp {-cKy/n) , 

where c is an absolute positive constant. (In fact this estimate for probability 
requires that n is sufficiently large, but, as i^T > 1 was arbitrary, we can adjust 
the constants.) This concludes the proof. □ 



Remark 3.12. Consider now a more general situation in which Xi, X2, . . . X^r 
- the columns of the matrix A - are still i.i.d. centered and log-concave, but 
not necessarily isotropic. Then there exists an n x n matrix T, such that 
{^i)iLi has the same distribution as {TYi)fLi, where Yi, . . . , Y/v are isotropic 
log-concave random vectors in M". For the purpose of computing probabil- 
ities we may assume that Xi = TYi. Therefore, with probability at least 
1 — exp(—cKy/n), we have for all m < N, 



A, 



N 

m- sup sup \y^{XiZi,y) 

I supp z I <m 



N 

sup sup y'(FjZi,T*?/) 



I supp 2 1 <m 



i=l 



< \\T*\\CK( v^+v^log — I = CKk( ^/^+ ^/^log—] , 

m J \ ml 



where k = \\T*\\ = ^/^\ (note that S = TT*) 



We conclude this section with a more technical variant of Theorem 13.61 
Note that in particular it requires weaker conditions on Xj's and does not 
require any bounds on N. 

Theorem 3.13. Let 1 < n and 1 < N. Let Xi, . . . , be independent 
random vectors in such that 



sup sup II {Xi,y) 11^1 < ip. 
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Let A be a random n x N matrix whose columns are Xi 's, and A^, m < N, 
is defined as before. Then for every 1 < m < N , every < / < logm, and 
every K > 1 one has 

( ( rn 48eA^2' 2A^\ 

P L4„ > Ci)K — log + v^log + 6 max \Xi 

\ \2'- m m J i<N 

f r.m, 12eA^2' 

< 1 + 21) exp -2K- log 

\ 2' m 

where C is an absolute constant. In particular, choosing < / < log m to be 
the largest integer satisfying 

2m, 12eA^2' 2N 

—r- log > Vm log 

2' m m 

we obtain that for every K >1 

P \ > CibKy/mlog + 6 max iXjl j < (1+2 logm) exp \ —KJrnlog '^— \ . 
\ m i<N J \ m J 



Remark 3.14. Note that from the definitions we immediately have 

^ Ai> max iXjl. 

i<N 

For completeness we outline a proof of Theorem 13. 131 

Proof (Sketch.) We proceed as in the proof of Theorem 13.61 So first we 
construct M.. If / = we define M. exactly as after formula (13. 2p . otherwise 
it will be constructed in the same way as it was constructed after formula 
(13.31) (note that now / is a fixed number). Then we estimate Dr^ = D'^ + D". 
As before we use Lemmas 13.31 and 13.41 

The only difference is that for the first summand in the formula for D'^ 
we use Lemma O with L = AKf log instead of L = 2K^. It will 

give us that 

/ m 48eA^2' 2A^\ 

P sup > IQA^K^-r log + C AmK^ Vmlog 

\x£M 2' m my 

f or^^T 48eAr2'\ , / 12eNA^\ 

< exp -2K- log + / exp -2K- log 

\ 2' m J \ 2' m J 



21 



and 

P sup D'' > 'iC^A^K^Xog < / exp -K— log 

Thus, with another absolute positive constant C we have 

P sup > CArnKll) — log + v^log 

\2' m m J 



, 2m ^ 12eN2^ 
< (l + 2/)exp -ir— lo; 



2' " m 

Finally we apply the same approximation procedure. By (13.41) and ap- 
proximation we get formula (13.61) 

— niax{36 max 18 sup D^}, 

which implies the result, by adjusting constants, if necessary. The "in par- 
ticular" part of the Theorem is trivial. □ 



Remark 3.15. It is possible to extend Theorem 13. 131 to a V'p-setting, similar 
to the one considered in |10]. Let p G [1,2] and let X be a random vector 
such that for some ipp > one has 

Eexp((|(X,y)|/^,)P)<2 

for every y G S*""^. Then, adjusting Lemmas 13.31 and and repeating the 
proof of Theorem 13.131 we can get 

2N\ 

¥\ Am> C^j,K^ (log +6 max \Xi 

' \ m J i<N 

2N' 
m 



< (1 + 21ogm) exp I —K^ y/m\og 



However we will not pursue this direction here. 
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4 Kannan-Lovasz-Simonovits question 



In this section, we answer the question presented in the introduction: Let 
K he an isotropic convex body in R"". Given e > 0, how many independent 
points Xi uniformly distributed on K are needed for the empirical covariance 
matrix to approximate the identity up to e with overwhelming probability? 

Let X G M"" be a centered random vector with covariance matrix S and 
consider N independent random vectors (Xj)j<Ar distributed as X. Using em- 
pirical processes tools, we first prove a more general statement (Proposition 
14.41) and then give applications to approximation of the empirical covariance 
matrix and to estimates of different norms of the matrix A = A^^\ In a 
final subsection we give a more elementary proof of the case {p = 2) that 
corresponds to the original question in ^12j . 



4.1 Approximation of covariance matrix 

First note that because of the linear invariance, (II. 5p implies 



1 ^ 

i=l 



< em\. 



Therefore without loss of generality we restrict ourselves to the case when 
the covariance matrix is the identity. 

Theorem 4.1. Let Xi, . . . , X^ be i.i.d. random vectors, distributed ac- 
cording to an isotropic, log-concave probability measure on M". For every 
e G (0, 1) and t>l, there exists C{e,t) > 0, such that if C{e,t)n < N , then 
with probability at least 1 — e~^^. 



I 1 ^ 

|-^X,®X,,-Id 

i=l 



(4.1) 



where c > is an absolute constant. Moreover, one can take C{e,t) = 
Ct^e~'^\og^{2t'^e~'^), where C > is an absolute constant. 

Since for a symmetric matrix M, one has ||M|| = sup^g^n-i (My, and 
E(Xj,?/)^ = one can rewrite (14. ip as 



I 1 ^ 



< e. 
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This way approximating the covariance matrix becomes a special case 
of a more general problem, concerning the uniform approximation of the 
moments of one dimensional marginals of an isotropic log-concave measure 
by their empirical counterparts. In particular, Theorem 14. II is implied by the 
following result. 

Theorem 4.2. Let Xi, . . . ^X^ he i.i.d. random vectors, distributed accord- 
ing to an isotropic, log-concave probability measure on M". For any p > 2 
and for every e G (0, 1) and t > 1, there exists C{e,t,p) > 0, such that if 
C{e,t,p)n'P^'^ < N, then with probability at least 1 — e"^''*^ (where Cp > 
depends only on p). 



Moreover, one can take C{e^t,p) = Cpt^^e ^log^^ '^{2t^e where Cp de- 
pends only on p. 

Remark 4.3. Proofs of both Theorems, 14. II and [4.21 use Theorem 13.61 which 
requires the condition N < exp(Y^). For larger iV, however, the result 
follows by a formal argument. Assume that the statement has been proved for 
< exp(yn) and assume that > exp(v/ri). Let Xi = {Xi{k)}l^^ e M", 
i < N,he the random vectors under consideration. Pick the smallest m such 
that A^ < exp{^/m). Clearly, m > n. Now consider random vectors Yi = 
{Yiik)}^^^ G M"', i < A^, defined by Yi{k) = Xi{k) for k < n and Yi{k) = gik 
for k > n, where gik are independent Gaussian A/'(0, 1) random variables. 
Then y^'s are isotropic log-concave random vectors to which the result can 
be applied. Identifying y = {y{k)}l^^ G 5"'^ with z = {zik)}"^^^ G 
defined by z{k) = y{k) for k < n, z{k) = for /c > n, we get 



with probability even higher than claimed. Thus in the proofs of both theo- 
rems we may assume without loss of generality that A^ < exp (^/n)- 




i=l 



(4.2) 




i=l 
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In the first step of tlie proof of Theorem 14.21 we shall use some tools 
from the probability in Banach spaces, in particular classical symmetrization 
and contraction methods as in [TT] and [TT]. These tools work for general 
empirical processes and are not necessary in our setting since we are dealing 
more specifically with powers of linear forms. We choose this approach, 
though, as it requires less computations and leads to a unified, simpler and 
more transparent presentation. 

Theorem 14.21 is an easy consequence of the following technical proposition 
applied with s = t. 

Proposition 4.4. In the setting of Theorem^J^ if n < N < e^, then for 
any s,t>l, the estimate 

1 1 ^ 

sup -5^(i(x„y)r-E|(x„y)n 

^ r.v-U v-l ^ P-1^2Arx ATT CPsPvP/^ r^vvf^Y /.ON 

<C^ Hs^ 'P^og^'[-)^J^ + ^^ + C^f[^) (4.3) 

holds with probability at least 

1 — exp(— csv^) — exp ( — Cp minj-u, v}^ 

where u = t^s'^P-^n\og^P~^{2N/n) , v = ts~'^VN^/ \og{2N/n) , C,c> are 
absolute constants and Cp > depends on p only. 

Remark 4.5. The two parameters s and t play different role in the proof 
and reflect different asymptotic behavior of the probability with which (14.41) 
holds. The first parameter s is related to a level of truncation of linear forms 
whereas the second is a factor in the deviation when one deals only with 
the truncated part. For instance, by taking s = t^/^, it allows us to get a 
probability converging to one as t — oo, if both dimensions are fixed. 

Before we proceed to the proof of the above proposition, let us introduce 
some tools from the classical theory of probability in Banach spaces. Be- 
low, El,..., En will always denote a sequence of independent Rademacher 
variables, independent of the sequence Xi, . . . , X^. 
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Lemma 4.6 (Contraction principle, see [TB], Theorem 4.12). Let F: — 

]R+ be convex and increasing. Let further ifi:M.^'M.,i<Nbe 1-Lipschitz 
with ipi{0) = 0. Then, for any bounded set T C M^, 

EF(-snp\y^ei^i{t,) ) < EF^sup I V^aI). 

Using standard symmetrization inequalities for sums of independent ran- 
dom variables (see e.g., Chapter 2.3. of [26]) and applying the lemma with 
F = 1, and V5j(s) = ^p'^^^i for s G M, we obtain the following corollary. 

Corollary 4.7. Let J-' be a family of functions, uniformly bounded by B > 0. 
Then for any independent random variables Xi, . . . , X^r and any p > 1, we 
have 

N N 

Esup I V(|/(X,)r - E|/(X,)r) < 4pB'^~'Esnp \y^6j{X, 

We will also use the celebrated Talagrand's concentration inequality for 
suprema of bounded empirical processes [25j. The version from [13j presented 
below, provides the best known constants in this inequality (we will however 
not take advantage of explicit constants). For a simple proof (with worse 
constants) we refer the reader to [TH [15] 

Lemma 4.8 ([E], Theorem 1.1). Let Xi, X2, . . . ,Xn be independent random 
variables with values in a measurable space {S, B) and let be a countable 
class of measurable functions f : S —>■ [—a, a], such that for alii, E/(Xj) = 0. 
Consider the random variable 

N 

Z = sup^/(X,). 



Then, for all t > 0, 

P(Z >EZ + t) < exp 



2((t2 + 2aEZ) + 3at 
where 

N 

a2 = supVE/(X,)2. 
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Proof of Proposition 14.41 For simplicity, throughout this proof we will 
use the letter C to denote absolute constants, whose values may change from 
line to line. 

For B > 1 (to be specified later) consider 



N 



E 



sup V (I (X„ I A BY - E(| (X„ I A BY 

c en — 1 I \ 



1 = 1 



N 



<ApBP-^¥. sup \y^e,{\{Xi,y)\AB] 



where the last line follows from Corollary 14. 7[ The function t i— > |t| A i? is a 
contraction, so 



N 



< 



E sup \J2{m^,y)\ABY-mX^,y)\ABy 



1=1 



N 



N 



8pBP-^E sup < 8p5P-^E Ve^X, 

' »=i ' i=i 

< SpB'P-^Vmi. 

Since by (El), E(|(X„ A5)2p < C2pp2p^ Lemma HJ implies that for t > 1, 
with probability at least 



1 — exp ^ — 



2A^C2pp2p ^ 32p52p-i ViVn + 24:pB^P-HVPIn^ 
> 1 - exp(-Cpmin(t2n52p-2,tViV^/5)), (4.4) 



one has 



AT 

sup \y^({\{X„y)\ABY -Ei\{X„y)\ABY) < IGipB^-^Vml. (4.5) 
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Observe that 



I 1 ^ 

ye5— iliV 



< 



sup I (X„ 2/) I A BY - E( I (X„ y) I A BY) 



1 ^ 



i=l 

N 



+ sup ^Ej2im,y)\'-Bni{liX.,y)\>B}, 

Each of the obtained three terms is estimated separately, with the first 
term aheady discussed in (14 .Sp and (14.41) . By (12.31) and Chebyshev's inequal- 
ity we have 



E|(X„y)ri|Kx.,,)|>B} < |KX„y)||^^VP(|(X„y)| >B)< CVe-^/^- 
Together with the previous inequalities this implies that 

«^p k7E(i(^-^)i'-^i<^-2/)n 

I 1 ^ 

<mpB^-Kr^+ sup ^$^|(X„|/)ri{|(x.,)|>B} + CVe-^/^, 



(4.6) 



with probability at least 



1 - exp(-Cp mm{t^nB^P-^, iVW^/B)). 



Thus it remains to estimate sup^g^n-i J2iLi I(^«'1/)I^1{|(^,,s/>|>b}- To this 
end we use Theorem 13.61 and Remark 13.101 It follows that for s > 1, with 
probability at least 1 — e~^'^^, we have, for all m < X and all z E S^~^ with 
I supp 2;| = m. 



N 

<Cs(v^+v^log(^)). (4.7) 

i=l 
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Dualizing this estimate and using the fact that for p > 2, the ip norm is 
dominated by the £2 norm, we obtain, for any set C {1, . . . , N}, 



sup (j2\{x.y)A < sup (J2\{x^,y)A 

< Cs(v^+ v^log(^)). (4.8) 
For an arbitrary y G 5""^ let Eb = Eeiy) := {i < N : \{Xi, y) \ > B}. Then, 



by (SSI), 

1/2 



B\E.\'/'< ij2\{X,,y)A <Cs(v/^+v^log(^) 



Thus, whenever 

S>2Cslog(^), (4.9) 
we obtain (for a different absolute constant C), 

\Eb\ < Cs'^nB-^. 

This combined with (14.81) implies, after taking the p'th powers and again 
adjusting constants, that with probability at least l — e~^^^, for all y E 5"""^, 

N 

i=l i&EB 

< CPsP(nP/^ + nP/^sPB-P log" (^))- 

Setting B = 2Cs\og{2N/n), so that ( 14. 9p is satisfied, and combining the 
resulting estimate with (14. 6p . we get 

I 1 ^ 



yes 



1=1 
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with probability at least 



1 - exp{-csy/n) - exp ( - Cpinin (fs'^''~^n\og^^~'^{2N/n) 



) 



) 



log(2A^/n) 



This completes the proof of Proposition 14.41 



□ 



Remark 4.9. Let G G M" be a standard Gaussian vector with the identity 
as the covariance matrix and let /i be a standard Gaussian random variable. 
Assume that h and G are independent and put X = /iG G M". Clearly its 
covariance matrix is the identity and it is easy to check that || {X, y) < c|y|, 
for every ?/ G M", where c is a numerical constant. Nevertheless, it is known 
from ^ that X does not satisfy the conclusion of Lemma I3.lt in fact the 
density of X is not log-concave. Now let us consider the matrix A = A^^"* 
with i.i.d. copies Xi = hi Gi, i = 1, . . . , N as columns with < e", where 
(hi) are i.i.d copies of h and similarly (Gj) i.i.d copies of G, {hi) and (Gj) 
independent. One can check that 



where c > is a numerical constant. Thus H^H > ^/cnlogN. This example 
shows that the sub-exponential decay of linear forms (-^i norm bounded) is 
not sufficient for our problem. 

Remark 4.10. In comparison, a sub-gaussian decay of linear forms is suf- 
ficient. Indeed, it is known (see for instance [l19_J) that if there exists c > 
such that Eexp (|c(X, < 2 for every y G S"'~^, then (11.51) holds with 
probability larger than 1 — exp(— c'n) for some numerical constant c' > 0. 

Remark 4.11. Another non necessarily log-concave example for which the 
conclusion of Theorems 13.61 and 14.11 are valid is obtained when < 
c\y\, for every y G M" and |X| < C^/n where c, G > are numerical con- 
stants. 
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4.2 Additional observations 

We note several observations for norms of random matrices from ^2 to £p, 

Corollary 4.12. For 1 < N < let T be a random N x n matrix with 
rows Xi, . . . , X]\f. Then for p > 2, with probability at least 1 — e^^^^ (where 
Cp > depends only on p ), 

liril,,-.,, <Cp(iVVp + n^/2), (4.10) 
with Cp > depending only on p. Moreover 

CpN'/P + cV^< n^h.^e^ < Cp{N^/P + n'/^), (4.11) 

where Cp, Cp > depend only on p and c > is an absolute constant. 

Proof Inequality fl4.10p for N < n follows from Theorem 13.61 and the com- 
parison between ip norms. For N > n, the inequality follows from Proposi- 
tion 

Since by log-concavity, moments and quantiles of ||r||£2^£p are equivalent, 
( liTTOjl implies that 

E\\T\U,^,^<Cp{N'/P + n'/^). 

On the other hand, a single row of F has expected Euclidean norm of the 
order of ^/rl and a single column of F has expected || ■ \\p norm of the order 
of c{p)N^^P, so the left hand side of (14. lip follows trivially. □ 

Corollary 4.13. For 1 < N < let T be a random Nxn matrix with rows 
Xi, . . . , X^. Then for p E [1,2), with probability at least 1 — e""^^ (where 
c > is an absolute constant), 

n,^e, < C{N^''P + N^/P-^'^n^/^) (4.12) 
for some absolute constant C > 0. Moreover 

a(iVi/P + < E||F||f,_^^ < C{N^'P + (4.13) 

where C*, c > are absolute constants. 
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Proof Inequality fl4.12l) and the right-hand side of fl4.13p follow from the 
corresponding results for p = 2, since 

liriL_. < ivVp-V2||r|| 



To prove the left-hand side of (I4.13p . it is enough to notice that if + 1/p 
1, then 



N 



i=l 



and the expected £p norm of a single column of T is at least cN^^^. □ 

One can also obtain an almost-isometric result for p G [1,2). 

Theorem 4.14. Let Xi, . . . , he i.i.d. random vectors, distributed accord- 
ing to an isotropic, log-concave probability measure on M". For any p G [1, 2) 
and for every e G (0,1) and t > 1, there exists C{e,t) > 0, such that if 
C{s)n < N < e^, then with probability at least 1 — e~'^^^ (where c > is 
an absolute constant). 



^^p k7E(K^-^)i'-^K^-^)n (4.14) 

Moreover, one can take C{e,t) = Ct^Pe-Hog^P-\2t^Pe-^) , wh ere C > is 
an absolute constant. 

Proof Since the proof differs only by technical details from the correspond- 
ing argument for p > 2, we will just indicate the necessary changes. We will 
use the notation from the proof of Proposition 14. 4[ 

Just as before, we truncate at the level of Ct log(2A^/?T,) and use the 
contraction principle to handle the bounded part of the process. As for 
the unbounded part, we also proceed as before, however now we use the 
comparison between the £3 and ip norm for p < 2 and k = \Eb\ < n, which 
yields 

I 1 ^ 



^« 1 , f2N\p-^ fir CHPn CPpPn 
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with probability at least 



1 - exp{-cty/n) - exp(-cmin(t^nlog^P~^(2A^/ra), VNu / \og{2 N/n))) 

(the constants in the exponents can be made independent of p, since now p 
runs over a bounded interval). This allows us to finish the proof. □ 



Remark 4.15. The isomorphic result for p = 1 was proven in [TO]. The 
same paper also considers p G (0, 1). 

4.3 Elementary approach for p = 2 

As announced earlier we will now briefiy describe a more elementary proof of 
Theorem 14 . 1 1 and Theorem 14.21 for p = 2. In this case, the classical Bernstein 
inequality and a net argument on the sphere may replace the contraction 
principle and concentration of measure for empirical processes, that have 
been used - via Lemma 14.81 - to prove (14. 5 p . The remaining part of the proof 
is left unchanged. 

The key point is the following well known observation: 

Lemma 4.16. Let Xi, i = 1,2, . . . , N , be arbitrary vectors in M". Let e G 
(0, 1) and let M be a ce-net of S^^^ , for some constant c G (0, 1). If we have 



sup 



N 

i=l 



1 ^ 

-5^((x„i/)2-l) <e 



then 

I 1 ^ 

where d depends on c. 



<c'e 



We postpone the proof of this Lemma and pass to the proof of Theorems 
lO and 14:21 

Fix a ce-net M of S^~^ of cardinality at most (3/ce)", and > to be 
determined later. Pick an arbitrary y G S*"""^. 

For the reader's convenience recall Bernstein's inequality. 
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Proposition 4.17 (Bernstein's inequality, cf. e.g., Let Zi be indepen- 

dent random variables, centered and such that \Zi\ < a for all 1 < i < N. 
Put Z = jj: Xlili ^i- Then for all t > 0, 



^{Z >t)< exp 



2((t2 + ar/3) 



where 



N 



a' = {l/N)Y,Var{Zi). 



i=l 



In our case Zi = {\{Xi,y)\ A B)^ - E{\{Xi,y)\ A B^, hi 1 < t < N, 
a = B'^. Since ¥.{\{Xi,y)\f = 1 tlien (Q implies 

Var{Z,)<E{\{X.,,y)\AB)^<c. 



Setting r = tByn/N we infer that 



1 ^ 

I- J] A Bf - mX^,y)\ A Bf) I > tB^/^ 

i=l 



with probability at most 



exp ( — c min (t'^B'^n, tV Nn/ B 



By the union bound. 



1 ^ 

^^pk7E((l<^-^)|/^^)'-^(l<^-^)|/\^)') <tB^^, (4.15) 

with probability at least 

1 — exp {n log ~ cmin(t^n_B^, Nn/ B)^ . 

This estimate corresponds to (14.51) . 

Using this estimate with B = Ct\og{2N / n) and handling the unbounded 
part the same way as in Proposition 14.41 (see the argument that follows (14.51) ) 
we obtain 

I 1 ^ 



1=1 



/2N\ ATT CH'^n AC^n 
< Ct^ log I - 



- 



+ 



\ n JV N N 



+ 



N 



(4.16) 
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with probability at least 

1 - exp(-ctyn) - exp fnlog (—) - cmin ft^nlog^(2A^/n), — — 

V \ce/ V Clog{2N/n) 

This corresponds to the estimates in Proposition 14.41 (for s = t). 

Now, for > C{e,t)n, and C{e,t) sufficiently large, the right hand side 
of fl4.16p is at most e and 5/c6 < 2N/ n which leads to the probability above 
to be at least 1 — exp{—cty/n). So with the same probability we get 

N 



I 1 ^ 



We can now conclude by Lemma [4.161 applied pointwise with Xi = Xi{uj) for 
00 from the event on which our estimates hold (recall that by the isotropicity 
assumption we have E|(Xj,?/)p = 1). 

Proof of Lemma 14.161 Consider the semi-norm || ■ || on M" defined by 



1 ^ 1/2 



i=l 

for 2/ G M". Our assumptions imply that 

l-e< Vl-£ < sup \\y\\ < Vl + E < 1 + 

The triangle inequality and homogeneity of || ■ || imply, by a standard argu- 
ment, that 

sup \\y\\<{l+£/2){l-c£)-^<l + 5, 

where 

, 1 + 5c - 3c2 



2(1 -c) 



To get a lower estimate, write an arbitrary y E S'"' ^ in the form y = 
yi + cey2, with yi E Af and ?/2 £ 5*""^ Then \\y\\ > \\yi\\ — ce\\y2\\ > 
(l-e) - ce{l + 6)>l-6i, where 



2 + c + 3c2 - 3c3 
= 2(1 -c) 
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Thus for all y E 

In particular \\y\\ G [0, 



1| < CiE for some Ci depending only on c. 



1 + Ci]. Using the fact that the function t 



IS 



Lipschitz with constant 2(1 + ci) on the interval [0, 1 + ci], we conclude that 



i=l 

where c' = 2ci(l + Ci) depends only on c. 



sup 



< c'e, 



□ 
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